-
Couldn't load subscription status.
- Fork 5
add streaming OLS implementation with duckdb arrow interop #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a streaming OLS implementation that leverages DuckDB's Arrow IPC support for memory-efficient regression on large datasets. The implementation uses sufficient statistics to process data in chunks rather than loading everything into memory.
- Adds streaming regression functionality with DuckDB Arrow interop
- Implements sufficient statistics-based OLS and Ridge regression
- Provides comprehensive test coverage for the streaming implementation
Reviewed Changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| duckreg/stream.py | Core streaming regression implementation with Arrow IPC support |
| tests/test_stream.py | Test suite covering streaming regression functionality |
| duckreg/init.py | Exposes StreamingRegression in public API |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
|
||
| def solve_ols(self) -> np.ndarray: | ||
| """Compute OLS coefficients.""" | ||
| if self.n < self.k: |
Copilot
AI
Sep 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition self.n < self.k checks if there are fewer observations than features, but self.k could be None if no data has been processed yet. This will raise a TypeError when comparing int with NoneType.
| if self.n < self.k: | |
| if self.k is None or self.n < self.k: |
| feature_cols: list[str] = None, | ||
| target_col: str = None, |
Copilot
AI
Sep 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using mutable default arguments (None) should be replaced with proper Optional type annotations and default to None explicitly. The type hints should be Optional[list[str]] and Optional[str].
| feature_cols: list[str] = None, | |
| target_col: str = None, | |
| feature_cols: Optional[list[str]] = None, | |
| target_col: Optional[str] = None, |
No description provided.